Biological Knowledge Integration in DNA Microarray Gene Expression Classification Based on Rough Set Theory
نویسندگان
چکیده
DNA microarrays have contributed to the exponential growth of genetic data from years. This large amount of gene expression data have been used in researches seeking diagnosis of diseases like cancer using classification methods. In turn, explicit biological knowledge about gene functions has also grown tremendously over the last decade. This work integrates explicit biological knowledge in classification process using Rough Set Theory, making it more effective. In addition, the proposed model is able to indicate which part of biological knowledge has been important for classification. The classification process is divided into five steps. Firstly, supergenes are created, which summarize the information of intersections of sets of genes (called basic categories)from biological knowledge using Principal Component Analysis. Then, continuous values of supergenes are discretized using Discriminant Fuzzy Patterns. The third step is to select the most relevant supergenes using the criterion of maximum β-relevance, supported by Rough Set Theory. Then, decision rules are generated using the CAI (Conjuntos Aproximados con Incertidumbre) model, which are the basis of the final classifier. Finally, a classifier is contructed using decision rules generated in the previous step, giving they an order of application based on a score. Based on a set of samples from DNA microarrays and explicit biological knowledge expressed as sets of genes that may or may not be related to the concept that seeks to be classified, the proposed model is evaluated, obtainin successful results compared to famous classification techniques.
منابع مشابه
Integration and Reduction of Microarray Gene Expressions Using an Information Theory Approach
The DNA microarray is an important technique that allows researchers to analyze many gene expression data in parallel. Although the data can be more significant if they come out of separate experiments, one of the most challenging phases in the microarray context is the integration of separate expression level datasets that have gathered through different techniques. In this paper, we prese...
متن کاملUsing Variable Precision Rough Set for Selection and Classification of Biological Knowledge Integrated in DNA Gene Expression
DNA microarrays have contributed to the exponential growth of genomic and experimental data in the last decade. This large amount of gene expression data has been used by researchers seeking diagnosis of diseases like cancer using machine learning methods. In turn, explicit biological knowledge about gene functions has also grown tremendously over the last decade. This work integrates explicit ...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملScalable Sequential Rough Parallel Bounded Symmetrical Clustering for Gene Expression Profile Analysis
The study on gene expression profiling of tissues and cells has become a major tool for discovery in medicine. Identification of co-expressed genes and coherent patterns is the central goal in gene expression profiling and the important task in the field of bioinformatics research. Clustering is an important unsupervised learning technique for Gene Expression Profile Analysis. Many conventional...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012